Divisive Parallel Clustering for Multiresolution Analysis
نویسندگان
چکیده
Clustering is a classical data analysis technique that is applied to a wide range of applications in the sciences and engineering. For very large data sets, the performance of a clustering algorithm becomes critical. Although clustering has been thoroughly studied over the last decades, little has been done on utilizing modern multi-processor machines to accelerate the analysis process. We propose a scalable clustering technique that benefits from existing parallel computers and networks of workstations. It supports the creation of multiresolution representations for very large geometric data sets. The output of the clustering process can be used for interactive data exploration, useful for view-dependent rendering, user-guided refinement, and progressive transmission.
منابع مشابه
ROCKET: A Robust Parallel Algorithm for Clustering Large-Scale Transaction Databases
We propose a robust and efficient algorithm called ROCKET for clustering large-scale transaction databases. ROCKET is a divisive hierarchical algorithm that makes the most of recent hardware architecture. ROCKET handles the cases with the small and the large number of similar transaction pairs separately and efficiently. Through experiments, we show that ROCKET achieves high-quality clustering ...
متن کاملLinear embedding of binary hierarchies and its applications
The discrete binary hierarchy (DBH) is a concept underlyingmany important issues in analysis of complex systems: knowledge structures, testand-search organization, evolutionary trees, taxonomy, data handling, etc. It appears that any DBH corresponds to an orthonormal basis of the Euclidean space related to the hierarchy leaves. The properties of these bases form a mathematical framework which c...
متن کاملSegmentation and alignment of parallel text for statistical machine translation
We address the problem of extracting bilingual chunk pairs from parallel text to create training sets for statistical machine translation. We formulate the problem in terms of a stochastic generative process over text translation pairs, and derive two different alignment procedures based on the underlying alignment model. The first procedure is a now-standard dynamic programming alignment model...
متن کاملHierarchical Time-Series Clustering for Data Streams⋆
This paper presents a time-series whole clustering system that incrementally constructs a hierarchy of clusters. The Online DivisiveAgglomerative Clustering (ODAC) system is an incremental implementation of divisive analysis clustering, using the correlation between timeseries as similarity measure. The system tests existing clusters by descending order of diameters, looking for a possible bina...
متن کاملOn the performance of bisecting K - means and PDDP * Sergio
problem is known as bisecting divisive clustering. Note that by recursively using a divisive bisecting clustering procedure, the dataset can be partitioned into any given number of clusters. Interestingly enough, the clusters so-obtained are structured as a hierarchical binary tree (or a binary taxonomy). This is the reason why the bisecting divisive approach is very attractive in many applicat...
متن کامل